feat(up): Foundry auto-setup, best-model selection, memory CRD parity + fix kars up hang#454
Merged
Merged
Conversation
… + fix kars up hang Make `kars up --foundry-endpoint` actually set up a BYO Foundry project for Memory Store, stop hardcoding a stale model, and fix the post-deploy hang. Foundry auto-setup (new cli/src/commands/up/foundry_setup.ts): - Discover the project; list deployed models (ARM control-plane, no Graph). - Pick the BEST deployed chat model instead of hardcoded gpt-4.1 (pure, tested ranking; --model always wins). Excludes embedding/image/audio. - Ensure an embedding model (Memory Store needs one); best-effort deploy text-embedding-3-small if absent. - Enable the project's system-assigned managed identity if missing (Memory Store authenticates internally as the project MI), then re-read principalId for the existing Azure AI User RBAC grant. All idempotent + non-fatal. CRD parity + status: - Emit a KarsMemory binding CR on `kars up` (Foundry endpoints only), matching what `kars dev` already creates (refs.ts buildKarsMemory/memoryRefName). - Print a CRD status report (InferencePolicy/ToolPolicy/KarsMemory/KarsSandbox). Fix the hang (two causes): - cli/src/preflight.ts: the RBAC spinner was only concluded when fetchSubscriptionPermissions threw or returned a non-empty set; an empty [] left it spinning, and its setInterval kept Node alive — `kars up` hung after the summary with the spinner still animating. Conclude it on the empty path. Also fix a second identical leak in the provider notFound path. - up.ts: process.exit(0) on success (belt-and-suspenders for the detached kubectl port-forward handle). Memory error unmasking (runtime): - foundry.ts ensureStore uses the STRICT router call for POST /memory_stores so the real 403/400 surfaces (MI not enabled / RBAC propagating / no embedding model) instead of the generic "could not be created". Security audit: docs/internal/security-audits/2026-06-25-foundry-autosetup-bestmodel-memory-spinner.md (2 sign-offs). Verification: CLI tsc+oxlint clean, 831 tests (+10); runtime tsc+oxlint clean, 244 tests; model ranking validated against the live azureclaw-foundry set. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Makes
kars up --foundry-endpointactually set up a BYO Foundry project so Memory Store works out of the box, stops hardcoding a stale model, fixes the post-deploy hang, and surfaces previously-masked memory errors.Surfaced from real clean-room runs (Pal + @laevenso). Not for merge until tested.
Foundry auto-setup (new
cli/src/commands/up/foundry_setup.ts)aztoken — no Graph).gpt-4.1(pure, tested ranking;--modelalways wins; excludes embedding/image/audio). On the liveazureclaw-foundryset this picksgpt-5.4.text-embedding-3-smallif absent.Azure AI UserRBAC grant.CRD parity + status
kars up(Foundry endpoints only), matching whatkars devalready creates.Fix the
kars uphang (two causes)cli/src/preflight.ts: the RBAC spinner was only concluded whenfetchSubscriptionPermissionsthrew or returned a non-empty set. An empty[](no throw) left it spinning — itssetIntervalkept Node alive, sokars uphung after the summary with the spinner still animating (reproduced by two operators). Now concluded on the empty path. A second identical leak in the providernotFoundpath is also fixed.up.ts:process.exit(0)on success (belt-and-suspenders for the detachedkubectl port-forwardhandle).Memory error unmasking (runtime)
ensureStoreuses the strict router call forPOST /memory_storesso the real 403/400 surfaces (MI not enabled / RBAC still propagating / no embedding model) instead of the generic "could not be created".Security audit
docs/internal/security-audits/2026-06-25-foundry-autosetup-bestmodel-memory-spinner.md(2 sign-offs). No new role/scope/principal — the two writes are operator-scoped, idempotent, best-effort, on their own Foundry resource.security-audit-required+copyright-headerspass locally.Verification
tscclean, oxlint 0 errors, 831 tests (+10 new).tscclean, oxlint 0 errors, 244 tests.Note
The memory-unmask change lives in the sandbox image — needs
kars push --only sandbox --apply(or the release build) to reach a running pod. Thekars upchanges are CLI-only and effective immediately.